As this is the very first exercise in this workshop it is not that hard. Its purpose is also to get used to this exercise format and, more importantly, to get first feeling for working in the tidyverse.
Just two short notes on working with the exercise files in this workshop:
We would like to ask you to solve all tasks by writing them into your own R script files. This ensures that all of your solutions are reproducible, and that you can (re-)use solutions from earlier exercises in later ones.
All exercises and their solutions ‘assume’ they are in the ./solutions folder of this repository. This way they can make use of files in other folders using relative paths. In order for your scripts to run properly, we suggest that you create (and save) them either in the exercises or solutions folder and set your working directory for the exercises accordingly (you can check your working directory with getwd() and change it with setwd()).
Again, the following exercise is really short and just supposed to let you play around with some pipes and tibbles. It’s a mini-exercise!
First things first: To work with the ‘tidyverse’, we have to have access to its packages.
tidyverse library.
tidyverse library has not been installed yet, you can install it with the command install.packages("tidyverse").
if (!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)
After successfully loading the tidyverse library we turn to the magic world of pipes. Remember, pipes are a convenient way to disentangle nested R functions and to write cleaner R code. First, have a look at the code in the following block:
mean(sqrt(as.numeric(read.csv2("../data/titanic/titanic.csv", sep = ",")$Fare)))
titanic data are imported with read.csv2()Fare variable is extracted using the $ operatoras.numeric()sqrt()mean()
Using the commands in such a way makes the code somewhat difficult to read and understand. You have already learned that pipes provide a straightforward approach to address this issue.
.$col_name.
read.csv2("../data/titanic/titanic.csv", sep = ",") %>%
.$Fare %>%
as.numeric() %>%
sqrt() %>%
mean()
## [1] 10.46045
As we have already learned, the tidyverse is not only about pipes, it’s also about specific formats of data. The default data format in the tidyverse is the tibble format. In the previous task, you have already imported the titanic data, but it is in the standard data.frame format.
titanic dataset and convert it to a tibble.
base-R’s read.csv2() is your friend. Also, you may want to do it all at once in one pipe.
titanic_tibble <-
read.csv2("../data/titanic/titanic.csv", sep = ",") %>%
as_tibble()
Now, look at the following data.frame. It’s been created with the standard base-R tools. The tibble package also provides the tribble() command to create small data tables as tibbles from scratch.
## day amount_coffee words_written
## 1 1 2 245
## 2 2 5 691
## 3 3 1 10
## 4 4 8 2100
## 5 5 4 490
tribble()-function to recreate the above dataframe as a tibble.
~ (tilde).
tribble(
~day, ~amount_coffee, ~words_written,
1, 2, 245,
2, 5, 691,
3, 1, 10,
4, 8, 2100,
5, 4, 490
)
## # A tibble: 5 x 3
## day amount_coffee words_written
## <dbl> <dbl> <dbl>
## 1 1 2 245
## 2 2 5 691
## 3 3 1 10
## 4 4 8 2100
## 5 5 4 490